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A FAST ANALYSIS-BASED DISCRETE HANKEL TRANSFORM 
USING ASYMPTOTIC EXPANSIONS 


ALEX TOWNSEND* 

Abstract. A fast and numerically stable algorithm is described for computing the discrete 
Hankel transform of order 0 as well as evaluating Schlomilch and Fourier—Bessel expansions in 
Q(N(\og N) 2 / loglog N) operations. The algorithm is based on an asymptotic expansion for Bessel 
functions of large arguments, the fast Fourier transform, and the Neumann addition formula. All 
the algorithmic parameters are selected from error bounds to achieve a near-optimal computational 
cost for any accuracy goal. Numerical results demonstrate the efficiency of the resulting algorithm. 
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1. Introduction. The Hankel transform of order v > 0 of a function /: [0,1] —> 
C is defined as PT] Chap. 9] 

F v {u) = f f(r)J v (ru)rdr , u > 0, (1.1) 

Jo 

where J„ is the Bessel function of the first kind with parameter v. When v = o, m 
is essentially the two-dimensional Fourier transform of a radially symmetric function 
with support on the unit disc and appears in the Fourier-Hankel-Abel cycle EH 
Sec. 9.3]. Furthermore, the Hankel transform of integer order v > 1 is equivalent to 
the Fourier transform of functions of the form e lu f(r), where 9 £ [0, 2^ r). Throughout 
this paper we take v to be an integer. 

Various semi-discrete and discrete variants of CQ]) are employed in optics [7], 
electromagnetics El, medical imaging m, and the numerical solution of partial 
differential equations [3]. At some stage these require the computation of the following 
sums: 


N 

fk = X CnJ <'( r k uj ri), 1 <k< N, (1.2) 

71=1 

where v is an integer, 0 < uq < • • • < to n < oo, and 0 < rq < • • • < r N < 1. Special 
cases of (11.21) that are considered in this paper are: (1) The evaluation of Schlomilch 
expansions at r*. [23] (ui n = nir or cu n = (n + v)ir)\ (2) The evaluation of Fourier- 
Bessel expansions of order 0 at r k {y = 0 and u n = j 0 n , where j 0n is the nth positive 
root of J 0 ); and (3) The discrete Hankel transform of order 0 15; (v = 0, uj n = j 0 n , 
and r k = j 0 k /j 0 jv+i). For algorithms that rely on the fast Fourier transform (FFT), 
such as the algorithm described in this paper, tasks (1), (2), and (3) are incrementally 
more difficult. We consider them in turn in sections |4j [5] and [6] respectively. 

Our algorithm is based on carefully replacing J„(~) by an asymptotic expansion 
for large arguments that, up to a certain error, expresses J„{z) as a weighted sum 
of trigonometric functions. This allows a significant proportion of the computation 
involved in m to be achieved by the discrete cosine transform of type I (DCT) 
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and the discrete sine transform of type I (DST). These are 0(N log N) FFT-based 
algorithms for computing matrix-vector products with the matrices Cjy and Sjy, where 


(Ojy)kn COS 


(k — 1 )(n — l)7r 
N-l 


{S N )kn = sin 


knit 
N + l 


1 < k, n < N. 


(1.3) 

Asymptotic expansions of special functions are notoriously tricky to use in prac¬ 
tice because it is often difficult to determine the precise regime for which they are 
accurate approximations. However, for transforms involving orthogonal polynomials, 
such as the Chebysliev-Legendre transform mm, successful fast algorithms have 
been derived based on asymptotic expansions. At the point of evaluation, these algo¬ 
rithms are competitive with multipole-like approaches [T], but have the advantage of 
no precomputation. This allows for more adaptive algorithms such as a convolution 
algorithm for Chebysliev expansions [9]. 

In this paper we will use error bounds to determine the precise regime for which 
an asymptotic expansion of Bessel functions is an accurate approximation and from 
there derive all the algorithmic parameters to achieve a near-optimal computational 
cost for any accuracy goal (see Table EH). For each task (l)-(3) there are just two 
user-defined inputs: A vector of coefficients c ±,..., Cjv in m and a working accuracy 
e > 0. The values of f k in (11.21) are then calculated to an accuracy of 0(e l c nl)- 
The resulting algorithm has a complexity of 0(N (log N ) log(l/ e) p / loglog N ), where 
p = 1 for task (1), p = 2 for task (2), and p = 3 for task (3). For the majority of this 
paper we state algorithmic complexities without the dependency on e. 

Previous algorithms based on asymptotic expansions for computing (Oil men 
have either a poor accuracy or a dubious numerical stability [5]. Here, we find that 
once an asymptotic expansion of Bessel functions is carefully employed the resulting 
algorithm is numerically stable, can be adapted to any accuracy goal, and requires 
no precomputation. Moreover, we show how the equally-spaced restriction, which is 
inherited from the reliance on DCTs and DSTs, can be alleviated. 

We use the following notation. A column vector with entries zq,..., v N is denoted 
by U, a row vector is denoted by v T , a diagonal matrix with diagonal entries zq,..., zq y 
is written as D v , and the N x TV matrix with ( k,n ) entry J v (a kn ) is denoted by J„(A) 
where A = (a^^^jy. 

The paper is organized as follows. In Section[2]we briefly discuss existing methods 
for computing (11.21) and in Section El we investigate the approximation power of three 
expansions for Bessel functions. In Section El we first describe an 0(N 3,/2 ) algorithm 
for evaluating Schlomilch expansions before deriving a faster O(N(log N) 2 / loglog N) 
algorithm. In Sections [5] and [6] we use the Neumann addition formula to extend the 
algorithm to the evaluation of Fourier-Bessel expansions and the computation of the 
discrete Hankel transform. 


2. Existing methods. We now give a brief description of some existing methods 
for evaluating (O that roughly fall into four categories: (1) Direct summation; (2) 
Evaluation via an integral representation; (3) Evaluation using asymptotic expansions, 
and; (4) Butterfly schemes. For other approaches see (5]. 

2.1. Direct summation. One of the simplest ideas is to evaluate J u (r k ui n ) for 
1 < k,n < N and then naively compute the sums in (11.21) . By using the Taylor 
series expansion in (13.31) for small z, the asymptotic expansion in m for large 
z 1 and Miller’s algorithm m Sec. 3.6(vi)] (backward recursion with a three-term 
recurrence) otherwise, each evaluation of J u (z) costs 0(1) operations [2]. Therefore, 
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this simple algorithm requires a total of 0(N 2 ) operations and is very efficient for 
small N (N < 256). However, the 0(N 2 ) cost is prohibitive for large N. 

Note that in general there is no evaluation scheme for J u (z) that is exact in infinite 
precision arithmetic. At some level a working accuracy of e > 0 must be introduced. 
Thus, even in the direct summation approach, the relaxation of the working accuracy 
that is necessary for deriving a faster algorithm, has already been made. 

The sums in m are equivalent to the matrix-vector product J„(rw T )c. One 
reason that an 0(N (log N) 2 / loglog N) algorithm is possible is because the argument, 
rw T , is a rank 1 matrix. A fact that we will exploit several times. 

2.2. Evaluation via an integral representation. When v is an integer, J v {z) 
can be replaced by its integral representation [T5J (10.9.2)], 

J u (z) =- / e lzcos ( 9 ) cos(v0)d6 = —— / e lzcos (0) cos(is6)d6i (2.1) 

71 Jo 2tt J 0 

where i = y/—l is the imaginary unit and the last equality holds because e 12COS (0 cos(i^d) 
is even about t = n. The integrand in (12.11) is 27r-periodic and the integral can be ap¬ 
proximated by a if-point trapezium rule. Substituting this approximation into m 
gives a double sum for / that can be evaluated with one FFT followed by an interpo¬ 
lation step in O (K N + N log N) operations [6]. The major issue with this approach is 
that the integrand in (12.11) is highly oscillatory for large z and often K m N, resulting 
in 0{N 2 ) operations. 

2.3. Evaluation using asymptotic expansions. One can replace J v {z) in (11.21) 

by an asymptotic expansion that involves trigonometric functions. A fast, but per¬ 
haps numerically unstable (see Figure l4Jl) . algorithm then follows by exploiting DCTs 
and DSTs. Existing work has approximated J u (z) using asymptotic expansions with 
rational coefficients [20], the approximation \/2/{irz) cos(,z — (2z/ + 1)7 t/ 4) [5, and 
for half-integer v the asymptotic expansion in (13.11) |24] . However, none of these ap¬ 
proaches have rigorously determined the regime in which the employed asymptotic 
expansion is an accurate approximation and instead involve dubiously chosen algo¬ 
rithmic parameters. In this paper we will use a well-established asymptotic expansion 
and rigorously determine the regime in which it is an accurate approximation. 

2.4. A fast multipole method approach. In 1995, Kapur and Rokhlin de¬ 
scribed an algorithm for computing the integral in m when v = 0 m- The main 
idea is to write CD as a product of two integrals as follows [14} (6)]: 

[i 1 [ u l f 1 

/ f(r)J 0 (roj)r dr = — / — / rf(r) cos(ru) dr du. 

Jo tt J-u, Vw 2 - u Jo 

The inner integral can be discretized by a (corrected) trapnezium rule and computed 
via a discrete cosine transform, while the outer integral involves a singular kernel and 
can be computed with the fast multipole method. As described in HU the algorithm 
can be used to evaluate Schlomilch expansions in O (N log N) operations (see Sec- 
tion[4]). With modern advances in fast algorithms for nonuniform transforms, it could 
now be adapted to the computation of the discrete Hankel transform though we are 
not aware of such work. 

2.5. Butterfly schemes. Recently, a new algorithm was developed for a class of 
fast transforms involving special functions [T5]. The algorithm compresses the ranks 
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of submatrices of a change of basis matrix by constructing skeleton decompositions 
in a hierarchical fashion. This hierarchical decomposition of the matrix means that 
it can then be applied to a vector in 0(N log N) operations. The compression step of 
the algorithm is the dominating cost and is observed to require 0(N 2 ) operations ns 
Table. 6]. Fortunately, in many applications the compression step can be regarded as 
a precomputational cost because it is independent of the values of cq,..., Cjv in (US), 
though it does depend on N. 

The methodology behind butterfly schemes is more general than what we present 
in this paper as it does not require knowledge of asymptotics. Here, we will derive a 
fast transform that has no precomputational cost and is better suited to applications 
where N is not known in advance. 

Some of the ideas in the spherical harmonic transform literature [221 (2S1 may be 
useful for reducing the precomputational cost in m , though we are not aware of work 
in this direction. 


3. Three expansions of Bessel functions. In this section we investigate three 
known expansions of Bessel functions [T8] Chap. 10]: (1) An explicit asymptotic 
expansion for large arguments; (2) A convergent Taylor series expansion for small 
arguments; and (3) The Neumann addition formula for perturbed arguments. Math¬ 
ematically, these three expansions are infinite series, but for practical use they must 
be truncated. Here, we will focus on deriving criteria to ensure that the errors intro¬ 
duced by such truncations are negligible. Related techniques were successfully used 
to derive a fast algorithm for the discrete Legendre transform HO]- 

3.1. An asymptotic expansion of Bessel functions for large arguments. 

The graph of J v (z) on [0, oo) is often said to look like an oscillating trigonometric func¬ 
tion that decays like 1 / yfz. This observation can be made precise by an asymptotic 
expansion of J„(z) for large z. 

Let v and M > 1 be integers and 2 > 0. The first 2 M terms of the Hankel 
asymptotic expansion of J v {z ) is given by [TSJ (10.17.3)] (also see [TTj): 


U z ) = 


2 
7 TZ 


M—l 


COS 


(- 1 )’ 


i miy) 


2m 


m =0 



+ Rv,m( Z )' 

m=0 ^ / 

(3.1) 

error term, a 0 (v) = 1, and 



M) = 


(4z/ 2 - 1 2 )(4 z/ 2 - 3 2 ) • • • (4z/ 2 - (2m - l) 2 ) 


m > 1 . 


The first term in ED shows that the leading asymptotic behavior of J„(*) as 
2 —> oo is y/2/('Kz) cos(/z), which is an oscillating trigonometric function that decays 
like l/yfz. The first 2M terms show that, up to an error of | R v ,m( z )\i Jv{ z ) can be 
written as a weighted sum of cos (n)z^ 2m ~ 1 ^ 2 and sin (fi)z~ 2m ~ 3 ^ 2 for 0 < m < M — 1. 
FigureEH(left) shows the error term \Rq,m{ z )\ on 2 £ (0, 50] for M = 1, 2,3,4, 5, 7,10. 
As expected, the asymptotic expansion becomes more accurate as 2 increases. In 
particular, for sufficiently large 2 the error term is negligible, i.e., | R v m (*)l < e- 

It is important to appreciate that ED is an asymptotic expansion, as opposed to a 
convergent series, and does not converge pointwise to J v (z) as M —> 00 . In practice, 
increasing M will eventually be detrimental and lead to severe numerical overflow 
issues. Figure l3Tl (right) shows the error |-R„ iM ( 10)| as M —> 00 and v = 0, 5,10, 20. 
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Fig. 3.1: Left: The absolute value of Rq^m( z ) i* 1 H3.1D on z e (0,50] for M = 
1,2,3,4, 5, 7,10. For any M > 1 and sufficiently large z, the asymptotic expansion is 
an accurate approximation. Right: The absolute value of R U} m( 10) for 1 < M < 40 
and v = 0, 5,10, 20. For a fixed z, increasing M may reduce the accuracy of the 
asymptotic expansion and will eventually lead to numerical overflow issues. 


Thus, the appropriate use of ED is to select an M > 1 and ask: “For what 
sufficiently large z is the asymptotic expansion accurate?”. For example, from Fig¬ 
ure [iO] (left) we observe that for M = 10 we have |7?o,io(“)l < 10~ 15 for any z > 17.8 
and hence, we may safely replace J 0 (z) by a weighted sum of cos(/z)z _2m_1 ^ 2 and 
sin (n)z~ 2m - 3 / 2 for 0 < m < 9 when z > 17.8. 

More generally, it is known that the error term \R v m{ z )\ is bounded by the size 
of the first neglected terms fljS, 10.17(iii)]. Since |cos(w)| < 1 and |sin(w)| < 1 we 
have 


\R v ,m{ z )\ < 



f I^2m( z/ )I l a 2M+l( z/ )l\ 

1 , 2M -I - ,2M+1 J ‘ 


(3.2) 


This is essentially a sharp bound since for any c > y/2 there is a z > 0 such that 
c\R v ,m( z )\ is larger than the magnitude of the first neglected termsQ 

Let s v M {e) be the smallest real number so that if z > s u M (e) then \R V m{ z )\ < e - 
From (13.21) we can take s u M (e) as the number that solves 

\ a 2M{ v )\ \ a 2M+l { v )\'N _ 

2 M + 2A/+1 ) — £ ’ 

z z / 



In general, the equation above has no closed-form solution, but can be solved numer¬ 
ically by a fixed-point iteration. We start with the initial guess of s[,°^(e) = 1 and 
iterate as follows: 


, 0 + 1 ) 

VAf 


(e) = 


a/2 (\< 12 m { v )\ + |«2M+l(l / )|/s^M( e )) 




j > 0. 


We terminate this after four iterations and take s v ^ M {e) « s^ 4 | f (e). Table l3Jl gives 
the calculated values of s^ M (10 -15 ) for 3 < M < 12 and v = 0,1,2,10. 


lr To see this pick z of the form (fc/4 + v + 1/4)7t for sufficiently large k. 
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M 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

s 0 ,M 

180.5 

70.5 

41.5 

30.0 

24.3 

21.1 

19.1 

17.8 

17.0 

16.5 

S 1,M 

185.2 

71.5 

41.9 

30.2 

24.4 

21.1 

19.2 

17.9 

17.1 

16.5 

S 2,M 

200.2 

74.8 

43.1 

30.8 

24.8 

21.4 

19.3 

18.0 

17.2 

16.6 

s 10 ,M 

2330.7 

500.0 

149.0 

64.6 

41.4 

31.4 

26.0 

22.9 

20.9 

19.6 


Table 3.1: The asymptotic expansion m is accurate up to an error of e for any 
z > Sj, jM (e). Here, the values of s !/jM (10 -15 ) are tabulated for 3 < M < 12 and 


1 ^ = 0 , 1 , 2 , 10 . 


3.2. The Taylor series expansion of Bessel functions for small argu¬ 
ments. For an integer j/, the Taylor series expansion of J„(z) about 2 = 0 is given 
by pH (10.2.2)] 


M z ) = 

t =0 


(-1 ) t 2~ 2t - v . 
t\(t + v)\ Z 


(3.3) 


In contrast to (EH), the Taylor series expansion converges pointwise to J u ( z ) for any 
z. However, for practical application the infinite series in (13.31) must still be truncated. 

Let T > 1 be an integer and consider the truncated Taylor series expansion that 
is given by 


T—l 


J A z ) = 

t =0 


(- l) t 2~ 2t ~ u ■ 
t\(t + v)\ Z 


+ E uT {z), 


(3.4) 


where \E vT (z)\ is the error term. As z —> 0 the leading asymptotic behavior of 
\E v T ( z )\ matches the order of the first neglected term and hence, lim 2 _> 0 \E l/ T (z)\ = 
0. In particular, there is a real number t u T (e) so that if z < t u T (e) then \E u T (z)\ < e. 

To calculate the parameter t u T (e) we solve the equation E u T (z) = e. Remarkably, 
an explicit closed-form expression for E u T (z) is known m and is given by 


Ev,t( z ) — 


(—1) 2 


Tr,-2T-v z ^t+u 


(T + u)\Tl 


' 1"2 


t + i r + ^ + i ; ' 



( — 1) 2 


T n -2T-v 2T+v 


(T + v)\T\ 


where ^F. 2 is the generalized hypergeometric function that we approximate by 1 since 
we are considering 2 to be small [TSJ (16.2.1)]. Solving E uT (z) = e we find that 

t v>T (e) w (2 2T+1/ (T + v)\T\e) ^ . 

Table EH gives the calculated values of t„ >T (10 -15 ) for 1 < T < 9 and v = 0,1,2,10. 

3.3. The Neumann addition formula for perturbed arguments. The 

Neumann addition formula expresses J v (z + Sz) as an infinite sum of products of 
Bessel functions of the form J„_ s (z)J s (5z) for s G Z. It is given by TB] (10.23.2)] 


OO 

J v {z + 5z)= ^2 J v _ s (z)J s (5z). (3.5) 

s =—OO 
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T 

1 

2 

3 

4 

5 

6 

7 

8 

9 

to,T 

6 x 10" 8 

0.001 

0.011 

0.059 

0.165 

0.337 

0.573 

0.869 

1.217 

t\,T 

3 x 10 -5 

0.003 

0.029 

0.104 

0.243 

0.449 

0.716 

1.039 

1.411 

t 2 .T 

0.001 

0.012 

0.061 

0.168 

0.341 

0.579 

0.876 

1.225 

1.618 

tlO ,T 

0.484 

0.743 

1.058 

1.420 

1.823 

2.262 

2.733 

3.230 

3.750 


Table 3.2: The truncated Taylor series expansion (13.411 is accurate up to an error of 
e for any z < t vT (e). Here, the approximate values of 5 ) are tabulated for 

1 < T < 9 and v = 0,1,2,10. 


Here, we wish to truncate (13.511 and use it as a numerical approximation to J v (z + Sz). 
Fortunately, when \5z\ is small (so that z + Sz can be considered as a perturbation of 
z) the Neumann addition formula is a rapidly converging series for J„{z + Sz). 

Lemma 3.1. Let v be an integer, z > 0, and \6z\ < e -1 . Then, for K > 1 we 
have 

K -1 

J v (z + Sz)~ E J v -s{z)J s {Sz) 

s=—K+l 

where e ~ 2.71828 is Euler’s number. 

Proof. Let v be an integer, z > 0, |<fe| < e _1 , and K > 1. Denote the left-hand 
side of the inequality in (13.61) by \A V K (z, Sz)\ so that 

—K oo 

K,k{z,Sz) = E J ^-s{z)J s {Sz) + E J v -s(z)J s (Sz). 

s=—oo S — K 

Since \J v {z)\ < 1 [HI (10.14.1)] and J- V {z) = (— 1)" J v (z) [181 (10-4.1)], we have 

— K oo oo 

\ a v , k ( z , 6 z )\< E i^(^)i + Ei J *( fc )i^ 2 Ei J «( fc )i- ( 3J ) 

s=—oo s—K s—K 


< 5.2 


e |(5z| 


K 


(3.6) 


By Kapteyn’s inequality [HI (10.14.8)] we can bound |J s (<5z)| from above as follows: 

e |<5;r| 


\Js{Sz)\ = ks(s(^/ s ))l ^ 


| Sz/sy e 


S s(l — \Sz/ s\ 2 ) 2 


(l + (1 - l^/s| 2 )') 


< 


2s 


(3.8) 


where the last inequality comes from e x < e(l + x)/2 for 0 < x < 1. Therefore, by 
substituting (13.81) into (13.71) we find that 


\A^ k (z,Sz)\ < 2 E 


s=K 


' |(5^ 


2s 


s 2 ££ 

s=K s—K 


\Sz 


< 5.2 


\Sz 


K 


where we used s~ s < 1.3 and Y^T=o( e \^ z \l^) S = 1/(1 — e|<5z|/2) < 2. □ 

Lemma 13. 1 1 shows that 2Z) s ^Efsr+i ^v-s( z )Jsi^ z ) approximates J v [z + Sz), up to 
an error of e, provided that h.2{e\8z\/2) K < e. Equivalently, for any 


N < - 

e 



K 


(3.9) 













K 

3 

4 

5 

6 

7 

8 

9 

10 

(1331) 

4 x 1CT 6 

9 x 1(T 5 

0.001 

0.002 

0.004 

0.008 

0.013 

0.020 


Table 3.3: Lemma 13.11 shows that Ylf=-K+i J^-s( z )Js(^ z ) is an approximation to 
J v (z + Sz), up to an error of e, for any \Sz\ < 2(e/5.2) 1 ^/e. Here, the values of 
2(e/5.2) 1 ^/e are tabulated for e = 10 -15 and 3 < K < 10. 


Table [331 gives the values of 2(e/5.2) 1 ^/e for e = 10 -15 . 

At first it may not be clear why Lemma 13.11 is useful for practical computations 
since a single evaluation of J v {z + Sz) is replaced by a seemingly trickier sum of 
products of Bessel functions. However, in sections [5] and [G] the truncated Neumann 
addition formula will be the approximation that will allow us to evaluate Fourier- 
Bessel expansions and compute the discrete Hankel transform. The crucial observation 
is that the positive roots of J 0 (~) can be regarded as a perturbed equally-spaced grid. 

3.3.1. The roots of J 0 (z) as a perturbed equally-spaced grid. Let j 0 n 

denote the nth positive root of Jq(z). We know from m that the leading asymptotic 
behavior of J 0 {z) is \J2/(ttz) cos (z — 7r/4) for large 2 . Therefore, since cos ((n — 
1/2)7r) = 0 for n > 1 the leading asymptotic behavior of j 0 ,n is (n — 1/4)7t for 
large n. Similarly, the leading asymptotic behavior of jo,k/jo,N+i is (fc — l/4)/(AT + 
3/4) for large k and N. The next lemma shows that jo,i; ■ • ■ i3o,n anc i the ratios 
Jo,i/io,JV+i) ■ • ■ 1 jo,N/jo,N+\ can be regarded as perturbed equally-spaced grids. 

Lemma 3.2. Let j 0 n denote the nth positive root of J 0 (z). Then, 


and 


Jo,n 



0 < b n < 


1 

8(n- i)7r’ 


JO ,k 
J0,AT+1 



+ e fc,A) 


\ e k,N\ < 


1 

w+w 



Proof. The bounds on b n are given in )12, Thm. 3]. For the bound on |efe jy| we 
have (since j 0)fc = (k - 1/4)tt + b k and j 0 , N +i = (N + 3/4)tt + 6jv +1 ) 


hN 1 

1 

0 


bk{N + f) — b N+1 (k — \) 

Jo,at+i {N + |) 


((N + |)7r + b N+1 )(N + |) 


The result follows from b N+i > 0 and |& fe (A^+3/4) — b N+ i(k— 1/4)| < (A r +3/4)/(8(fc— 
1/4)tt). □ 

Lemma [3.21 shows that we can consider j 01 , .. . , j 0 N as a small perturbation of 
3 tt/4, 7tt/4, ..., (N - l/4)?r and that we can consider j 0> i/jo,jv+i, ■ ■ ■ Jo,n/Jo,n+i as 
a small perturbation of 3/(47V + 3), 7/(4N + 3),..., (47V — l)/(4iV + 3). This is an 
important observation for sections [5] and [6] 

4. Fast evaluation schemes for Schlomilch expansions. A Schlomilch ex¬ 
pansion of a function / : [0,1] —> C takes the form [23] : 

N 

fir) =^2c n J v (nnr). 

n =1 

8 


(4.1) 

















Here, we take v > 0 to be an integer and develop a fast 0(iV(log iV) 2 /loglogTV) 
algorithm for evaluating Schlomilch expansions at rq,... ,r N , where r k = k/N. The 
evaluation of ED at rq,• • •, r N is equivalent to computing the matrix-vector product 
/ = J„(i:w T )c, i-e., 



/ J^rq^i) 
\JA r N^l) 


••• JvinuN) 

Ju{ t N^n), 

■Mr“ T ) 



(4.2) 


where f k = f(r k ) and oj n = mr. The (fc,n) entry of J„(rw T ) is J v (kmr/N ), which 
takes a similar form to the (k + 1, n + 1) entry of C^r +1 , cos (knir/N), and the ( k , n) 
entry of Sjy-i, sm(krm/N). see ( 11 . 31 ) . We will carefully use the asymptotic expansion 
in ED to approximate J„(r w T ) by a weighted sum of C N+1 and S N _i. 

4.1. A simple and fast evaluation scheme for Schlomilch expansions. 

To derive a fast matrix-vector product for J„(rw T ), we first substitute z = into 
the asymptotic expansion ED- By the trigonometric addition formula we have 



As a matrix decomposition we can write this as the following sum of C^r+i and Sn-i: 


COS 




diQ J C N+1 Q + e? 2 


'S J N -1 O' 
0 T 0 


(4.3) 


where 0 is a zero column vector, Q = [0 = cos((2z/ + 1)7 t/ 4), and d 2 = 

sin((2^ + l)7r/4). Similarly, for sine we have 




—d 2 Q J Cjy+iQ + di 


's * N -1 o' 
o T o 


(4.4) 


Thus, by substituting (14.31) and (14.41) into ED with z = r lo t we obtain a matrix 
decomposition for J„(rw T ), 


M -1 


J„(ruj T ) = V ( ~ 1 )^ (t/ ) D - 2m ~i 
m=0 'J 1 '! 2 

M—l , 1 \m / \ 

_ ^ (~1) Q 2m+ i(^) 


diQ T ^ +1 Q + d 2 "JJf 1 Q 


0 


m =0 


\A72 


—d 2 (5 T Cjv-(-iQ + di 


1 O' 

0 T 0 


D, 


D~ 


—2m— i 


R^m(21W T ), 


(4 -5) 

where we used the fact that rw T is a rank 1 matrix. We can write ED more conve¬ 
niently as 


Jz/(m T ) = (zE) + R 


(4.6) 


One could now compute J„ by first computing the vector (ro; T )c 

and then correcting it by Rj,,M(Z3d T )c- The matrix-vector product J^aT(z:w t )c 
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Fig. 4.1: The cause of numerical instability in the matrix decomposition (14.611 . The 
entries in the black region of (rw T ) and R^mCe^ 1 ) are large, of similar magni¬ 
tude, and of opposite sign, resulting in numerical overflow issues and severe cancella¬ 
tion errors when added together. The nonwhite region of Rj, m(e<± 1 T ) contains all the 
entries of R^jif(raj T ) that are larger in magnitude than e. The asymptotic expansion 
should only be employed when |-R„ M( r fc w n)l < e - 


can be computed in 0 (TV log IV) operations since the expression in (14.51) decomposes 
( L w T ) as a weighted sum of diagonal matrices and 2 M DCTs and DSTs. 

While this does lead to an 0(N log N) evaluation scheme for Schlomilch expan¬ 
sions, it is numerically unstable because the asymptotic expansion is employed for 
entries J v {r k u} n ) for which r k u n is small (see Section EblT) . Numerical overflow issues 
and severe cancellation errors plague this approach. Figure H7T1 illustrates the entries 
that cause the most numerical problems. We must be more careful and only employ 
the asymptotic expansion for the (k,n) entry of J v if \Ru,M^ r k u} n)\ < e - 

By Section l3Jl we know that if z > s vM (e) then \R v m( z )\ < £■ Therefore, we 
can use (14.61) for the ( k,n ) entry of J„(ru T ) provided that r k ui n = nkir/N > s vM (e). 
This is guaranteed, for instance, when 

min(fc,n) > [oJV 5 ], (4.7) 

where a = (s v m{£)/' k ) 1 ^ 2 . Therefore, we can be more careful by taking the N x N 

diagonal matrix Q 1 with ( Qi)a = 1 for i > \aN 2 ~\ and 0 otherwise, and compute 
J l/ (ruJ T )c using the following matrix decomposition: 

J„(m T ) = qX S m (rw T )Qi + QiRv,m(lu j )Qi + (4.8) 

where j^ VAL (ru; T ) is the matrix 


/ -rEVAL 


(™ T ))h = 


Ju (Oc^n) ■ 

0 , 


min(fc,n) < \aN 2 ~\, 
otherwise. 


If the ( k,n ) entry of Qi J(^aJ( rw T )(5i is nonzero, then nkn/N > s v M {e) and 
hence, (14.81) only approximates J v (r k ui n ) by a weighted sum of trigonometric functions 
when it is safe to do so. Figure l4~2l shows how the criterion m partitions the entries 
of J„(rw T ). A stable 0(N 3 ^ 2 ) algorithm for evaluating Schlomilch expansions (14.11) 
follows, as we will now explain. 

By construction all the entries of QiR^ i m(cw t )Q 1 are less than e in magnitude 
so its contribution to the matrix-vector product J„(ru; T )c can be ignored, i.e., we 
make the approximation 

J„(rw T )c Rj + J™ L (lw T )c. 
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£ ^ 



Fig. 4.2: A partition of the entries of J„(rw T ) that motivates the matrix decompo¬ 
sition (14.81) . The asymptotic expansion (13.11) is used for the ( k,n ) entry if it satisfies 
min (k,n) > [alV 3 ]. The gray region shows the nonzero entries of J™ L (rw T ) that 
are not dealt with by the asymptotic expansion. The black curve indicates the entries 
satisfying \R Ut M( r k w n)\ = e. A numerically stable 0{N 3 ^ 2 ) algorithm for evaluating 
Schlomilch expansions follows. 


The matrix-vector product can be computed in 0(N log N) op¬ 

erations using DCTs and DSTs by (14.51) . Moreover, the number of nonzero entries 
in j ® VAL (rw T ) is 0(aN 1 ^ 2 N) = 0(N 3 ^ 2 ) and hence, the matrix-vector product 
jEVAL( can kg computed in 0(N 3 ^ 2 ) operations using direct summation. 

This algorithm successfully stabilizes the 0(N log N) matrix-vector product that 
results from (14.61) as it only employs the asymptotic expansion for entries of J„(r fc w n ) 
for which r k co n is sufficiently large. In practice, this algorithm is faster than direct 
summation for N > 100 (see Figure l4~4|) . However, the algorithm has an 0(N 3 ^ 2 ) 
complexity because the criterion m is in hindsight too cautious. We can do better. 

4.2. A faster evaluation scheme for Schlomilch expansions. Note that 
the asymptotic expansion ED is accurate for all 0(N 2 ) entries of J„(rw T ) except 
for 


( N N \ / N \ 

EEWmwi )=° 

n=l fc=1 / \n=l J 

of them, where 1 is the indicator function and the last equality is from the leading 
asymptotic behavior of the TVth harmonic number as N —> oo. Therefore, a lower 
complexity than 0(N 3 ^ 2 ) seems possible if we use the asymptotic expansion for more 
entries of J„(ru; T ). Roughly speaking, we need to refine the partitioning of J„(rw T ) 
so that the black curve, | R v ,M{. r k^n)\ = C in Figure l4~2l is better approximated. 

First, we are only allowed to partition J„(ru; T ) with rectangular matrices since 
for those entries that we do employ the asymptotic expansion we need to be able to use 
DCTs and DSTs to compute matrix-vector products. Second, there is a balance to be 
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found since each new rectangular partition requires 2 M more DCTs and DSTs. Too 
many partitions and the cost of computing DCTs and DSTs will be overwhelming, but 
too few and the cost of computing J^ VAL (rw T )c will dominate. These two competing 
costs must be balanced. 

To find the balance we introduce a parameter 0 < /3 < 1. Roughly speaking, if 
/3 ss 1 then we partition as much as possible and the asymptotic expansion is used for 
every entry satisfying If /? ~ 0 then we do not refine and we keep 

the 0(N 3 ^ 2 ) algorithm from Section |4~T1 An intermediate value of /? will balance the 
two competing costs. 

Figure l4~3l shows the partition of J„(rw T ) that we consider and it is worth care¬ 
fully examining that diagram. The partition corresponds to the following matrix 
decomposition: 


Jy(Zlik T ) ~ (LU T )Qi + ( Qsp+lJ^M (Zhk T )Q2p + Q2pJi/,M (L^J)Q2p+l 

P= 1 ' 

+ jf AL (^ T ), 

(4.9) 

where P is the number of partitions, Q 1 is the diagonal matrix in Section 14.11 and 
the TV x TV matrices Q 2p and Q 2 p+i are diagonal matrices with 


(Q2 p)ii 


1, \af3 P N 2 ~\ <i<N, 
0, otherwise, 


(Q 


2p-\-l)ii 


1, \a/3 p N^~\ < i < [a/3 p+1 N ^\, 


0, otherwise. 


Note that if the ( k,n ) entry of the matrix Q 2 p+i (zikr T )I? 2 p is nonzero, then 
k > \aj3 p N 2 ) and n > \a/3~ p N 2 ]. Hence, nk > a 2 TV and r k uj n > s vM (e). Similarly, 
if the (k, n) entry of (o±f T )Q 2 p+i is nonzero, then r k u> n > m( c )- Therefore, 

we are only employing the asymptotic expansion m on entries of J„( r w T ) for which 
it is accurate and numerically stable to do so. 

Each matrix-vector product of the form Q 2p +i J)^ SY ( r w T )Q 2p c requires 2 M DCTs 
and DSTs from (14.51) and hence, costs O(TVlogiV) operations. There are a total of 
2 P + 1 matrices of this form in (14.91) so exploiting the asymptotic expansion requires 
O(PTVlogTV) operations. 

The cost of the matrix-vector product j® VAL ( r w T )c is proportional to the number 
of nonzero entries in 3 U (rw T ). Ignoring the constant a, this is approximately (by 
counting the entries in rectangular submatrices of J™ L (rw T ), see Figure fOl) 


2/3 P N% + 2 V j3~ p N^ (/ 3 p ~ l N * - /3 P N?) = 2f3 P N? + 2PN ( 2- - 1 ) = O ( PN//3) 


P= 1 


P 


where the last equality follows from the assumption that fi P N 1 ^ 2 = 0(1). 

To balance the competing 0(PN log TV) cost of exploiting the asymptotic ex¬ 
pansion with the 0(PN/j3 ) cost of computing j® VAL ( rw T )c, we set PN log N = 
PN//3 and find that /3 = 0(l/logTV). Moreover, to ensure that the assumption 
/3 P N 2 = 0(1) holds we take P = O (log TV/ log j3) = O (log TV/ log log TV). Thus, the 
number of partitions should slowly grow with TV to balance the competing costs. 
With these asymptotically optimal choices of (3 and P the algorithm for comput¬ 
ing the matrix-vector product J„(ru T )c via (14.91) requires 0(PN log TV + PN//3) = 
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Fig. 4.3: A partition of the entries of J„(ru/ r ) that motivates the matrix decompo¬ 
sition dUD - The gray region shows the nonzero entries of J™ L (rw T ) that are not 
dealt with by the asymptotic expansion. The black curve indicates the entries satis¬ 
fying \Rv t M{ r k w n)\ = e - The asymptotically near-optimal choice of the parameter /? 
is 0(1/log N) and the number of partitions should grow like 0(log N/ loglog N) with 
N. A numerically stable 0 (IV (log TV) 2 / loglog N) algorithm for evaluating Schlomilch 
expansions follows. 


0(iV(logfV) 2 /loglog N) operations. More specifically, the complexity of the algo¬ 
rithm is 0(N(\ogN) 2 log(l/e)/logloglV) since M = 0(log(l/e)), see Figure ECT 

Though, the implicit constants in /3 = 0(1/log N) and P = O (log N/ loglog N) 
do not change the complexity of the algorithm, they must still be decided on. After 
numerical experiments we set /3 = min(3/ log N, 1) and P = [log(30a -1 7V -1 ^ 2 )/ log ff\ 
for computational efficiency reasons. 

Table [4711 summarizes the algorithmic parameters that we have carefully selected. 
The user is required to specify the problem with an integer is, a vector of expansion 
coefficients c in ED, and then provide a working accuracy e > 0. All other algorithmic 
parameters are selected in a near-optimal manner based on analysis. 

Remark 4.1. A fast evaluation scheme for f(r) = X^=i c nT„((n. + 7)7r?’), where 
7 is a constant, immediately follows from this section. The constants di and d 2 
in ED and (I4~4l) should be replaced by diagonal matrices D v and D w , where 14 . = 
cos(— r ykn/N + (2 is + 1)7t/4) and w k = sm^-ykn/N + (2 is + 1)7t/4), respectively. 

4.3. Numerical results for evaluating Schlomilch expansions. We now 

compare three algorithms for evaluating Schlomilch expansions: (1) Direct summa¬ 
tion (see Section [5D; (2) Our 0(N 3 ^ 2 ) algorithm (see Section 14. ip . and; (3) Our 
d(7V(log N) 2 / loglog N) algorithm (see Section l4~2l) . The three algorithms have been 
implemented in MATLAB and are publicly available from [25]. 

Figure |T4] (left) shows the execution time for the three algorithms for evaluating 
Schlomilch expansions for v = 0. Here, we select the working accuracy as e = 10 15 
and use Table im to determine the other algorithmic parameters. We find that our 
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Parameter 

Short description 

Formula 

M 

Number of terms in (13.11) 

max( [0.3 log(l/e)J, 3) 

s v,M( e ) 

z > s v M (e) => \R v>m (z)\ < e 

see Table l3Tl 

a 

Partitioning parameter 

( s j/,M( e )/ 7r ) 1/2 

p 

Refining parameter 

min(3/log A, 1) 

p 

Number of partitions 

[log(30a 1 A -1//2 )/ log /?] 


Table 4.1: Summary of the algorithmic parameters for computing (14.21) . The user is 
required to specify the problem with an integer v 1 a vector of expansion coefficients c 
in dm and provide a working accuracy e > 0. All other algorithmic parameters are 
carefully selected based on analysis. 



Fig. 4.4: Left: Execution time for computing (14.21) with 0(N 2 ) direct summation, 
the 0(N 3 ^ 2 ) algorithm (see Section POT) . and the (H(A(log A) 2 /loglog A) algorithm 
(see Section 14.21) . Right: Computational timings for A = 5,000 with 2 < M < 20 
and e = 10 -15 ,10 11 ,..., 10~ 3 . Numerical experiments like these motivate the choice 
M = max(L0.31og(l/e)J,3) in (14.51) . 


0(A(logA) 2 /loglogA) algorithm in Section ITT21 is faster than direct summation for 
A > 100 and faster than our 0(N 3 ^ 2 ) algorithm for A > 160. In fact, for A < 158 
the number of partitions, P, is selected to be 0 and the 0(N(log N) 2 / loglog A) 
algorithm in Section Pi. 2 1 is identical to the 0(N 3 ^ 2 ) algorithm in Section Pi. 1 1 For 
a relaxed working accuracy of e = 10 -3 or e = 10 ~ 8 our 0(N(log N) 2 / loglog A) 
algorithm becomes even more computationally advantageous over direct summation. 

Figure S31 (right) shows the execution time for our 0(N(logN) 2 / loglog N) algo¬ 
rithm with N = 5,000,1 < M < 20, and working accuracies e = 10~ 15 ,10 —11 ,..., 10 ~ 3 
For each e > 0 there is a choice of M that minimizes the execution time. Numerical 
experiments like these motivate the choice M = max(L0.31og(l/e)J, 3) in (14.51) . 

5. A fast evaluation scheme for Fourier—Bessel expansions. In this sec¬ 
tion we adapt the 0(N(logN) 2 /loglogN) algorithm in Section IT~?1 to an evaluation 
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scheme for expansions of the form: 


N 

fir) = ^2, c nJ v (rj 0>n ), r e [o, 1], 

n= 1 


(5.1) 


where v is an integer and j 0 n is the nth positive root of J 0 . For v = 0, (15.11) is the 
Fourier Bessel expansion of order 0 of the function / : [0,1] —> C. The algorithm 
that we describe evaluates EH at r k = k/N for 1 < k < N, which is equivalent 
to computing the matrix-vector product Jj,(n 2 j)c, where j_ 0 is a column vector with 

entries j 0 i i,... 

Developing a fast algorithm for computing the matrix-vector product J„(r 2 .j)c 
is a challenge for FFT-based algorithms because the positive roots of J 0 are not 
equally-spaced. Here, we are able to derive a fast evaluation scheme by considering 
jo i,..., jo.v as a perturbed equally-spaced grid (see Section 13.3. ID . 

First, we derive a matrix decomposition for J„(w {v + w) J ), where u, v, and w 
are column vectors. Afterwards, we will consider w + was a perturbation of v. 

Proposition 5.1. For integers v and N > 1 and for column vectors u, v, and 
w of length N, we have the following matrix decomposition: 


J „(u(v + w) J ) 


E E 


(—l)* 2 _2t_s 
t\(t + s)! 


D 


2 i —(— s 




2 


(5.2) 


Proof. We prove (15.21) by showing that the (k, n ) entry of the matrix on the 
left-hand side and right-hand side are equal for 1 < k,n< N. 

Pick any 1 < k, n < N. By the Neumann addition formula (13.51) we have 

OO 

(j v(u(v + w) T )) kn = J„(u k Vn + u k w n ) = ^2 J v - s {u k v n )J s (u k w n ). (5.3) 

s =—OO 


Moreover, by applying the Taylor series expansion of Bessel functions (13.31) we have 

(5.4) 


oo / 2 — 

^s (V'k'^n') ^ ^ 7777 ! El (^k^n) 


2 t+s 


t =0 


t\(t + s)\ 


By substituting (|5.4p into (|5.3 [) we obtain 


OO OO 


(j v(u{v + w) T )) kn = E ( ,u! 2 „M J v-s(u k v n )(u k w n f t+s 


S ——oo t= 0 


t\(t + s)! 


and the result follows. □ 

We now wish to write down a particular instance of (15.21) that is useful for com¬ 
puting J !/ (h2o)c- By Lemma [3~2l we can write j 0n = ui n + b n , where ui n = (n — l/ 4 ) 7 r 
and b n is a number such that 0 < b n < 1/(8(n — 1/4)7 t). In vector notation we have 
2o = Q. + b. Hence, by Proposition 15. II we have 


•Mziio) = J v(r(u + b) J ) 


E E 


(_l)t 2 -2t- S 

t\(t + s)! 


D 2 r t+S J 


V — S 


(ru T )D 


2 t+s 
b 


(5.5) 


This is a useful matrix decomposition since each term in the double sum can be applied 
to a vector in 0(A r (logfV) 2 /loglog7V) operations. The diagonal matrices D 2t+S an d 
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D 2t+S ha veO(N) matrix-vector products and J I y_ s (?:w T ) has 0(iV(log N) 2 / loglog N) 
matrix-vector products (see Section R31 aud Remark 14.11) . However, for (15.511 to be 
practical we must truncate the inner and outer sums. 

Let K > 1 be an integer and truncate the outer sum in (15.51) to Y2^=22k+v By 
Lemma 13.11 with z = r k Cb n and 5z = r k b nl the truncated Neumann addition formula 
is still accurate, up to an error of e, for the ( k,n ) entry of J„(;r.2.J) provided that 


\r k K I < \K\ < 


_ l -<-{—)* 

8 [n — j)n e \5.2/ 


Solving for n we find that 


a^i-i «>*<«)■ 


(5.6) 


That is, once we truncate the outer sum in (15.51) to 'Y^=-k+i the matrix decompo¬ 
sition is still accurate for all the columns of J^rio) except the first LPif( e )J- For 
example when K = 6, p 6 (10 -15 ) ~ 22.7 and hence, once the outer sum is truncated 
we must not use (ESI) on the first 22 columns of Ji,(zL2o)- Based on numerical ex¬ 
periments we pick K > 1 to be the smallest integer so that Pk{ £ ) < 30 and hence, 
K = 0{ log(l/e)). 

Next, we let T > 1 be an integer and truncate the inner sum to Y^t= o • By 
Section [3.21 the truncated Taylor series expansion is accurate, up to an error of e, for 
the (fc, n) entry of 3 l ,(rjJ ) ) provided that 

Mnl < \K\ < < _ .min t a>T (e) « (2 2T (T!) 2 e) 2T , 

8(n—j)7T 0<.<Jf-i V / 

where the minimization is taken over 0 < s < K — 1 , instead of — K + l<s<iv — 1, 
because of the relation J_ s (z) = (—1 ) s J s (z) (IS] (10.4.1)]. Solving for n we find that 


e — 1/(2 T) 1 

n — - TFr "b T 

16tt(T!) 1/t 4 


9t( £ )- 


(5.7) 


Thus, once the inner sum is truncated in (15.51) to Y^t=o the decomposition is accurate 
for all but the first L9r( e )J columns of J„(2I2o)- For example when T = 3, < 7 3 (10 —15 ) ss 
3.7 and we must not use (|5.5I) on the first 3 columns of J„(ll2o)- In practice, we select 
T > 1 to be the smallest integer so that < 7 T (e) < 30. 

To avoid employing the decomposition (15.51) with truncated sums Y^ K k+i *121=0 
on the first (max(p^-(e), gj’(e))J columns of we partition the vector c. That 

is, we write J„(rjJ)c = 3„(rjJ ) )c 1 + J„(rio)c 2 > where 


(lll)ra 


c n , n> [max(p K (e), q T (e))\, 
0, otherwise, 


G; 2)11 


c n , n < [ma,x{p K (e),q T (e))\, 
0, otherwise. 


First, we compute / = J I/ (r. 2 o)£i using (15.51) with truncated sums Y 2 K k +1 Y 2 t=0 m 
C2(iV(log N) 2 / loglog N) operations and then compute / = J v (rj] ) )c 2 using direct 
summation in 0{N) operations. 
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5.1. Rearranging the computation for efficiency. At first it seems that 
computing J„(rio)c is (2K—1)T times the cost of evaluating a Schlomilch expansion, 
since there are ( 2K — 1 )T terms in X^A'+i Y^t=o ■ However, the computation can be 
rearranged so the evaluation of (EH) is only 2T + K — 2 times the cost. For K = 6 
and T = 3, the values that we take when e = 10 -1 , we have (2 K — 1 )T = 33 and 
2 T + K — 2 = 10 so there is a significant computational saving here. 

Since J_ s (z) = (—1 ) s J s (z) [18] (10.4.1)] we have 


K -1 T—l ( t 2t-s 

J v(rjl)c= ^2 £ xi/x , -n D ^ +Sj u- s (rQ J )Dl t+s c 


S—-K+ 1 t=0 


f!(f + s)! 


T-l , ^ 2t-s 

1 -£)^ +s ]J„_ s (rw T ) + (-l) s J„ +s (rw T )] -Df +S c 


XV — _L J. — X 

= ££ 

s=0 o 

2T+X-3 

= EA 

ia=0 


f!(f + s)! 

min(Lfj,T-l) t „ 

E " l -1 ! ^ 

_ K t!(u - t)\ ' 

i=max( [ ——,0) 




nit 

D b c, 


where the single prime on the sum indicates that the first term is halved and the 
double prime indicates that the last term is halved. Here, the last equality fol¬ 
lows by letting u = 2t + s and performing a change of variables. Now, for each 
0 < u < 2T + K — 3, the inner sum can be computed at the cost of one appli¬ 
cation of the C>(N(log N) 2 / loglog N) algorithm in Section 14.21 Since each evalua¬ 
tion of a Schlomilch expansion costs <D(N(log N) 2 log(l/e)/loglog N) operations and 
K = 0(log(l/e)), a Fourier-Bessel evaluation costs 0(N(log N) 2 log(l/e) 2 / loglog N) 
operations. 

5.2. Numerical results for evaluating Fourier-Bessel expansions. Fig¬ 
ure 15.11 (left) shows the execution time for evaluating Fourier-Bessel expansions of 
order 0 with direct summation and our 0(N (log N) 2 / loglog N) algorithm with work¬ 
ing accuracies of e = 10 -1 °, 10 8 ,10 3 . Our algorithm is faster than direct summation 
for N > 700 when e = 10 . Relaxing the working accuracy not only decreases K and 

T, but also speeds up the evaluation of each Schlomilch expansion. When e = 10 —3 , 
our algorithm is faster than direct summation for N > 100. 

The algorithmic parameters are selected based on careful analysis so that the eval¬ 
uation scheme is numerically stable and the computed vector / has a componentwise 
accuracy of (^(ellclli), where || • 1^ is the vector 1-norm. In Figure [5J] (right) we verify 
this for e = 10~ 15 ,10 -8 , 10 -3 by considering the error ||/ — / quad ||ooi where || • Hoc 
is the absolute maximum vector norm and / quad is the vector of values computed 
in quadruple precision using direct summation. For these experiments we take the 
entries of c as realizations of independent Gaussians so that ||c|| x = 0(N 1 ^ 2 ). 

6. The discrete Hankel transform. We now adapt the algorithm in Section[5] 
to the computation of the discrete Hankel transform (DHT) of order 0, which is defined 
as .[15] 

n 

fk = £ c n^o(io,feio,n/io,JV+l)j 1 < k < N , (6.1) 

n =1 

where j 0 n is the nth positive root of Jq(z). It is equivalent to the matrix-vector 
product Jo(lo2o/io.iv+i)l' and is more difficult for FFT-based algorithms than the 
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Fig. 5.1: Left: Execution time for evaluating (15.11) at 1/JV,..., N/N with 0(N 2 ) di¬ 
rect summation and the 0(iV(log./V) 2 /loglog TV) algorithm described in this section. 
For a relaxed working accuracy our algorithm becomes even more computationally 
advantageous over direct summation. Right: Maximum absolute error in the com¬ 
puted vector /. Here, we compare the observed error (solid lines) with the expected 
bound of (^(ellcll!) (dotted lines). 


task in Section [5] because it may be regarded as evaluating a Fourier-Bessel expansion 
at points that are not equally-spaced. 

To extend our 0(./V(log TV) 2 /loglog N) algorithm to this setting we consider the 
ratios jo l/jo N+ii ■ ■ ■ > Jo n/Jon+i as a perturbed equally-spaced grid. By Lemma HO 
we have 


jo,fc 
Jo.at+i 



+ e fc,JV> 


l e fe,jv| < 


1 

w+w 



1 <k<N. 


We can write this in vector notation as lo/jo.iv+i = L+<bv, where f k = (fc—1/4)/(iV-F 
3/4). Since J 0 ( (v + w)u T ) = J 0 (m(h + 1£) t ) t and J 0 (ioio/io,JV+i) = J 0 ((£ + e N )H), 
we have by Theorem 15.11 the following matrix decomposition: 


oo oo 

Jo(ioio/io,JV+l) = J3 53 

s =—oo £=0 


(~lj t 2~ 2t ~ s 
t\(t + s)! 


D 2 e i +s J- s (rjZ)Dl 


lo 


( 6 . 2 ) 


Each term in m can be applied to a vector in 0(N(log N) 2 / loglog N) op¬ 
erations. The diagonal matrices Dj^ +S and D 2t + S have O(N) matrix-vector prod¬ 
ucts and by Section [5] the matrix-vector product J_ s (L2o)c can be computed in 
0(N (log N) 2 / loglog N) operationsQ However, for (16.21) to be practical we must trun¬ 
cate the double sum. 

Let K > 1 and T > 1 be integers and consider truncating the sum in (16.21) to 
s -l2 K , 1 Y^t=o ■ Using similar reasoning to that in Section [5j if we truncate (16.21) 
then the matrix decomposition is accurate, up to an error of e, provided that 


Ef=_ 


\ e k,NJo,n\ < 


J0,r 


1 + 


8 (IV ■ 




< 


8(n-|)(JV+|)7r 

8 (k - i)7T 


< 


1.01 


< 


8(fc-i)7r e 


(£)' 


2 Here, r k = (k — l/4)/(A r + 3/4) = (4fc — l)/(4A r + 3) so to compute J_ s (£ 2q)c pad the vector 
c with 3 N + 3 zeros, use the algorithm in Section [5] with N replaced by 4 N + 3, and then only keep 
hi -1 for 1 < i < N. 
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and 


l e fe,JvJo.nl < sr , L01 i. < min t aT (e) « (2 2T (T!) 2 e) 2T . 

8(fc — j)TT 0<s<K — l \ / 

where we used the bound j 0 n / ((iV+3/4)7r) < (n — l/4)/(7V+3/4) + l/(8(n—1/4) (IV+ 
3/4)7 t 2 ) < 1.01 for n,N > 1. Equivalently, provided that 

k > 1.01 x ma x(p K (e), q T (e)), 

where p^(e) and Qri 6 ) are defined in (15.61) and (15.71) . respectively. That is, after 
truncating the sums in dnuD we can use the decomposition on all the entries of 
Jo(loio/io jv+i) except for those in the first [1-01 max(p K (e), q T (e))J rows. In prac¬ 
tice, we take K > 1 and T > 1 to be the smallest integers so that 1.01 ma x(p K (e), g r (e)) < 
30. 

To avoid employing a truncated version of (16.21) on the first few rows we compute 
/ = J o( lo io/j'o,JV+l)c using (JOJ) with sums Y,*=-k+ i T,t=o in 0(N(l og N) 2 / loglog N ) 
operations, discard the first [1-01 max(p^(e), q T (e))\ entries of /, and then compute 
the discarded entries using direct summation in O(N) operations. 

In the same way as Section 15.11 we can rearrange the computation so that only 
2 T + K — 2, instead of (2 K — 1 )T, Fourier-Bessel evaluations are required. Each 
Fourier-Bessel evaluation requires 2 T+K— 2 Schlomilch evaluations. Hence, a discrete 
Hankel transform requires (2 T + K — 2) 2 Schlomilch evaluations. Since K = log(l/e), 
our algorithm for the DHT requires 0{N(\ogN) 2 log(l/e) 3 /loglog N) operations. 

The exposition in this section may suggest that the same algorithmic ideas con¬ 
tinue to work for v > 0. While this is correct, the computational performance rapidly 
deteriorates as v increases. This is because the vector of Bessel roots j_ u and the 
vector of ratios jJ,/j v ,N+i are distributed less like an equally-spaced grid for v > 0. 
This significantly increases the number of terms required in the Taylor expansion and 
Neumann addition formula. For large v > 0, different algorithmic ideas should be 
employed, for example, one should take into account that | J v (z)\ can be very small 
when 0 < z < v. It is expected that the matched asymptotic (transition) region of 
J v (yZ), when z ~ v, will be the most difficult to exploit when deriving a fast algorithm. 

6.1. Numerical results for computing the discrete Hankel transform. 

We have implemented our 0(N(log N) 2 / loglog N) algorithm for computing the DHT 
in MATLAB. It is publicly available from [251 . 

Figure EH] (left) shows the execution time for computing the DHT of order 0 with 
direct summation and our 0(N (log IV) 2 / loglog N) algorithm with working tolerances 
of e = 10 1 °, 10 8 ,10 -3 . Our algorithm is faster than direct summation for N > 6,000 
when e = 10~ 15 , for N > 2,000 when e = 10~ 8 , and for N > 100 when e = 10~ 3 . 
Figure 16.11 (right) verifies that the selected algorithmic parameters do achieve the 
desired error bound of ||/_ / quad ||oo = ^( e l|c||i), where / is the vector of the computed 
values of m and j’ quad is the vector of values computed in quadruple precision using 
direct summation. In this experiment we take the entries of c to be realizations of 
independent Gaussians (with mean 0 and variance 1) so that ||c|| x = 0(N ' ). 

The DHT is, up to a diagonal scaling, its own inverse as N —> oo. In particular, 
by [T5] we have, as N —> oo, 

— -Jo(2oiJ/j'o.iv+i)AA(2oiS/jo,iv + i)A, ~ -/vi (6-3) 

Jo.at+i 
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Fig. 6.1: Left: Execution time for computing the DHT in (16.11) using direct summation 
and our d?(lV(log N) 2 /\og\ogN) algorithm with e = 10 -15 ,10 8 ,10~ 3 . Right: The 
maximum absolute error in the computed vector /. Here, we compare the computed 
error (solid lines) with the expected error bound 0(e||c||i). 


where D v is a diagonal matrix with entries v n = 2/J 1 (j 0 n ) 2 , and I N is the N x N 
identity matrix. We can use the relation (16.31) to verify our algorithm for the DHT. 
Figure HOI shows the error ||c — cll^, where c is a column vector with entries drawn 
from independent Gaussians and 

C= - 2 —— Jo( 2o iS/jo.iv+i) A,Jo(io ll/jo,N+i)DvC. (6.4) 

Jo,JV+l 

For small N, ||c— will be large because m only holds as N —► oo. However, even 
for large N, we observe that ||c — cH^ grows like 0(N 3 ^ 2 ). This can be explained by 
tracking the accumulation of numerical errors. Since the entries of c are drawn from 
independent Gaussians and v n = 0(n) we have || x = 0(N 3 ^ 2 ) and hence, we 
expect ||Jo(io 2 o/io,Ar+i)- D «c|| 00 = O(e||Z?„c|| x ) = 0(eN 3/2 ). By the same reasoning 
we expect ||Jo(io 2 o/io,JV+i)-D 1 ;Jo(ioio/ 2 o,iv+i)-D il c|| 00 = (D(eN 7/2 ). Finally, this 
gets multiplied by j^N+i = 0(N~ 2 ) in (16.41) so we expect ||c— cIIqq = 0(N 3 ^ 2 ). If in 
practice we desire ||c — clloo = 0(1), then it is sufficient to have \c n \ = o(n~ 2 ). 

Conclusion. An asymptotic expansion of Bessel functions for large arguments is 
carefully employed to derive a numerically stable 0(N(\og N) 2 / loglog N) algorithm 
for evaluating Schlomilch and Fourier-Bessel expansions as well as computing the 
discrete Hankel transform. All algorithmic parameters are selected based on error 
bounds to achieve a near-optimal computational cost for any accuracy goal. For a 
working accuracy of 10 15 , the algorithm is faster than direct summation for evaluat¬ 
ing Schlomilch expansions when N > 100, Fourier-Bessel expansions when N > 700, 
and the discrete Hankel transform when N > 6,000. 
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Fig. 6.2: The maximum absolute error ||c — c||, where c and c satisfy (IQ) . 
Here, the DHT was computed with direct summation and our algorithm with 
e = 10~ 3 ,10 —8 ,10 _1 °. As N —> oo the DHT is its own inverse, up to a diagonal 
scaling. For N < 700 the error ||c — c|| decays like 0{N~ ) because <16.311 only holds 
asymptotically as N —> oo. For N > 700 the error ||c — c|| grows like 0(N 3 ^ 2 ) from 
numerical error accumulation. 
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